NLP-NITMZ @ MSIR 2016 System for Code-Mixed Cross-Script Question Classification

نویسندگان

  • Goutam Majumder
  • Partha Pakray
چکیده

This paper describes our approach on Code–Mixed Cross– Script Question Classification task, which is a subtask 1 of MSIR 2016. MSIR is a Mixed Script Information Retrieval event in conjunction with FIRE 2016, which is the 8th meeting of Forum for Information Retrieval Evaluation. For this task, our team NLP–NITMZ submitted three system runs such as: i) using a direct feature set; ii) using direct and dependent feature set and iii) using Naive Bayes classifier. The first system is our baseline system, which is based direct feature sets and we used a group of keywords to generate this direct feature set. To identify question classes our baseline system falls in ambiguity (means one question is tagged with multiple classes). To deal with this ambiguity, we developed another set of feature and we consider this feature set as dependent feature set, because keywords from this set is worked with direct feature set. The highest accuracy of our system is 78.88% using method–2 and we submitted as run–3. Our other two runs have same accuracy as 74.44%.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Code Mixed Cross Script Question Classification

With the growth in our society, one of the most affected aspect of our routine life is language. We tend to mix our conversations in more than one language, often mixing up regional language with English language is a lot more common practice. This mixing of languages is referred as code mixing, where we mix different linguistic constituents such as phrases, proper nouns, morphemes etc. to come...

متن کامل

Amrita-CEN@MSIR-FIRE2016: Code-Mixed Question Classification using BoWs and RNN Embeddings

Question classification is a key task in many question answering applications. Nearly all previous work on question classification has used machine learning and knowledge-based methods. This working note presents an embedding based Bag-ofWords method and Recurrent Neural Network to achieve an automatic question classification in the code-mixed BengaliEnglish text. We build two systems that clas...

متن کامل

Overview of the Mixed Script Information Retrieval (MSIR) at FIRE-2016

The shared task on Mixed Script Information Retrieval (MSIR) was organized for the fourth year in FIRE-2016. The track had two subtasks. Subtask-1 was on question classification where questions were in code mixed Bengali-English and Bengali was written in transliterated Roman script. Subtask-2 was on ad-hoc retrieval of Hindi film song lyrics, movie reviews and astrology documents, where both t...

متن کامل

Modeling Classifier for Code Mixed Cross Script Questions

With a boom in the internet, the social media text had been increasing day by day and the user generated content (such as tweets and blogs) in Indian languages are written using Roman script due to various socio-cultural and technological reasons. A majority of these posts are multilingual in nature and many involve code mixing where lexical items and grammatical features from two languages app...

متن کامل

Ensemble Classifier based approach for Code-Mixed Cross-Script Question Classification

With an increasing popularity of social-media, people post updates that aid other users in finding answers to their questions. Most of the user-generated data on social-media are in code-mixed or multi-script form, where the words are represented phonetically in a non-native script. We address the problem of Question-Classfication on social-media data. We propose an ensemble classifier based ap...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016